SOM overview
Basic information
SOM: self-organizing map (SOM) (or Kohonen map)
unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set
Computational complexity: \(O(S^{2})\)
Original book - Teuvo Kohonen, 1982
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p {p} p variables measured in n {n} n observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional “map” such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.
Pros:
Data can be easily interpreted and understood with the help of techniques like reduction of dimensionality and grid clustering.
Self-Organizing Maps are capable of handling several types of classification problems while providing a useful, and intelligent summary from the data at the same time.
Cons:
- It does not create a generative model for the data and therefore the model does not understand how data is being created.
- Self-Organizing Maps do not perform well while working with categorical data and even worse for mixed types of data.
- The model preparation time is comparatively very slow and hard to train against the slowly evolving data.
Algorithm
- Randomize the node weight vectors in a map
- Randomly pick an input point
- Traverse each node in the map:
- Use the Euclidean distance formula to find the similarity between the input vector and the map’s node’s weight vector
- Track the node that produces the smallest distance (this node is the best matching unit, BMU)
- Update the weight vectors of the nodes in the neighborhood of the BMU
- Increase s {s} s and repeat from step 2 while
Simple example
Step 1: Randomize the node weight vectors in a map
## Error : 'format_warning' is not an exported object from 'namespace:cli'
- Randomly pick an input point
- Traverse each node in the map:
- Use the Euclidean distance formula to find the similarity between the input vector and the map’s node’s weight vector
- Track the node that produces the smallest distance (this node is the best matching unit, BMU)
- Update the weight vectors of the nodes in the neighborhood of the BMU
- Increase s {s} s and repeat from step 2 while